Skip to content

Conversation

@mohamedawnallah
Copy link
Contributor

@mohamedawnallah mohamedawnallah commented Oct 8, 2025

Change Description

The original regression caused by this changeset 16a8b623b.

Closes #10180.
Closes #9081.
Closes #8157 – Can't reproduce that issue since 18.3+ releases.

Resolves Discussion #10146.
(Can't Reproduce) Resolves Discussion #9094.
(Overlap) Resolves Discussion #10227.
(Perhaps Overlap) Resolves Discussion #9908.

Related Issues:
(Can't Reproduce) #9108.
(Perhaps Overlap) #10225.

Steps to Test

Steps for reviewers to follow to test the change.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

@gemini-code-assist
Copy link

Summary of Changes

Hello @mohamedawnallah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This PR optimizes neutrino rescans by integrating the wallet's birthday into the rescan logic, significantly speeding up blockchain synchronization for newly created wallets. It modifies neutrinonotify and chainview components to accept and utilize a walletBirthday parameter, which is then passed as a neutrino.StartTime option to limit the rescan range. Associated test files are updated to reflect these changes.

Highlights

  • Intent: This pull request aims to optimize the neutrino rescan process by leveraging the wallet's creation date (birthday). By passing the wallet birthday to the rescan, the system can avoid scanning blocks that predate the wallet's existence, thereby fixing slow blockchain sync issues, especially when a wallet is created before the neutrino client has completed its header synchronization. This addresses the problem described in issue [bug]: wallet creation slows sync if run before neutrino headers complete #10180.
  • Changes: The core change involves propagating the wallet's birthday (time.Time) through various components that initiate neutrino rescans. This includes modifications to the neutrinonotify package for chain notifications and the chainview package for filtered chain views. The neutrino.StartTime rescan option is now utilized to limit the scan range based on the wallet's birthday. Test files have also been updated to accommodate the new walletBirthday parameter, typically setting it to the Unix epoch for full historical scanning in tests.
  • Reviewer Activity: No specific reviewer activity has been recorded or provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an optimization for neutrino rescan by using the wallet's birthday. The changes are well-contained and the logic is sound. I've identified a few areas for improvement, mainly related to code formatting and documentation, to align with the repository's style guide.

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 0f26e5b to dae9086 Compare October 8, 2025 17:12
@saubyk saubyk added this to the v0.20.0 milestone Oct 8, 2025
@saubyk saubyk added this to lnd v0.20 Oct 8, 2025
@saubyk saubyk moved this to In progress in lnd v0.20 Oct 8, 2025
@Roasbeef
Copy link
Member

Roasbeef commented Oct 8, 2025

Can we get a brief recap in the OP re: what happened, was it a regression, why this fixes it?

Another follow up: if the regression was introduced in btcwallet, is that a better way to fix it?

How can we develop a regression test for this issue?

@saubyk saubyk moved this from In progress to In review in lnd v0.20 Oct 9, 2025
@saubyk saubyk moved this from In review to In progress in lnd v0.20 Oct 9, 2025
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 9, 2025

Hi @shocknet-justin, Would it be possible to try this potential patch if that resolves your issue (#10180)? I can try to write a reproducible test (admittedly not the easiest thing to do) but having +1/-1 that this changeset works/not works can be helpful

@hsjoberg
Copy link
Contributor

hsjoberg commented Oct 9, 2025

Thank you for looking into this one @mohamedawnallah. I believe I'm experiencing the same issue as @shocknet-justin.

I tried running your patches, but it seems like there's still weird behavior on first run.

2025-10-10 01:47:08.475 [INF] BTCN blockmanager.go:1163: Successfully got filter headers for 918 checkpoints
2025-10-10 01:47:08.475 [INF] BTCN blockmanager.go:643: Fully caught up with cfheaders at height 918000, waiting at tip for new blocks
2025-10-10 01:47:08.476 [INF] BTCN blockmanager.go:733: Attempting to fetch set of un-checkpointed filters at height=918354, hash=00000000000000000001ac0a0f9ad634af778a391b799b80ff644c851d378aa1
2025-10-10 01:47:18.483 [INF] BTCN headerlogger.go:67: Verified 96000 filter headers in the last 12.72s (height 918001, 2025-10-07 10:32:03 +0400 +04)
2025-10-10 01:47:18.483 [INF] BTCN manager.go:327: Canceling block subscription: id=2
2025-10-10 01:47:18.572 [INF] NTFN neutrino.go:693: New block: height=14001, sha=00000000b44efc0402c4841e13c29e2843776abd8ca1a728b4d12a092966e87a
2025-10-10 01:47:18.661 [INF] NTFN neutrino.go:693: New block: height=14002, sha=000000001e4c88a360df5e1f5b4f0b91676dc552acb2343f71068d4b464801f7
2025-10-10 01:47:18.989 [INF] NTFN neutrino.go:693: New block: height=14003, sha=0000000095f738799810fbf31566faac20087ffb00598a35a02ce2166e387117
2025-10-10 01:47:19.321 [INF] NTFN neutrino.go:693: New block: height=14004, sha=0000000080863585423586350af531c858597cb5b9e59f0ea3ff6d63678f62e6
[Continues like this...]

The issue can be seen with:

./lnd-debug --lnddir="./lnddir" --noseedbackup --nolisten --bitcoin.active --bitcoin.mainnet --bitcoin.node=neutrino --feeurl="https://nodes.lightning.computer/fees/v1/btc-fee-estimates.json" --routing.assumechanvalid --tlsdisableautofill --neutrino.addpeer=asia.blixtwallet.com --neutrino.addpeer=europe.blixtwallet.com --neutrino.addpeer=btcd.lnolymp.us --neutrino.addpeer=btcd-mainnet.lightning.computer

Let me know if I can help you or perhaps test something.

@shocknet-justin
Copy link

I've got some time sensitive materials I need to finish today but will test in the next day or two. It looks like Hampus is still reproducing it however, the --noseedbackup flag I believe should be triggering the wallet create as quickly as my deployment scripts would.

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from dae9086 to 16e24f1 Compare October 12, 2025 01:27
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 12, 2025

Benchmarking Report

With the changeset in this PR

We bring LND v0.20.0 latest release candidate to normal initial sync times that was previously taking days to complete under ideal environmental conditions now takes just over a minute.

lnd_execution_time
Execution Time (seconds)
lnd_peak_memory
Peak Memory (MB)
lnd_cpu_usage
CPU Usage (%)
Benchmarks in Raw Format
dev@dev:~/lnd$ sudo ./compare_lnd_versions.sh
==========================================
LND Version Comparison Benchmark
==========================================
Comparing 4 versions:
  - v0.18.3-beta
  - v0.18.4-beta
  - v0.18.5-beta
  - v0.20.0-beta.rc1-after

Each version will run in isolated namespaces
for fair comparison.

>>> Running v0.18.3-beta (1/4)...
==========================================
Benchmarking LND v0.18.3-beta
Using temporary storage: /tmp/lnd-bench-v0.18.3-beta-tQ9SWS
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 851327)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 851327 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.18.3-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 348.76 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 437.65 MB | CPU: 223.00%
[Sample 2] Block: 150000 | synced: false | Memory: 119.67 MB | CPU: 190.00%
[Sample 3] Block: 243000 | synced: false | Memory: 131.54 MB | CPU: 158.00%
[Sample 4] Block: 337000 | synced: false | Memory: 150.61 MB | CPU: 171.00%
[Sample 5] Block: 412000 | synced: false | Memory: 160.77 MB | CPU: 151.00%
[Sample 6] Block: 464000 | synced: false | Memory: 174.42 MB | CPU: 143.00%
[Sample 7] Block: 536000 | synced: false | Memory: 193.35 MB | CPU: 157.00%
[Sample 8] Block: 635000 | synced: false | Memory: 207.39 MB | CPU: 167.00%
[Sample 9] Block: 700000 | synced: false | Memory: 217.50 MB | CPU: 153.00%
[Sample 10] Block: 764000 | synced: false | Memory: 242.05 MB | CPU: 154.00%
[Sample 11] Block: 822000 | synced: false | Memory: 253.53 MB | CPU: 152.00%
[Sample 12] Block: 914000 | synced: false | Memory: 278.78 MB | CPU: 144.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.18.3-beta
========================================
Execution Time: 70.434696628 seconds
Peak Memory Usage: 437.65 MB
Average Memory Usage: 231.76 MB
Average CPU Usage: 157.76%
Samples Collected: 14

Results saved to: benchmark_v0.18.3-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.3-beta-tQ9SWS

Waiting 10 seconds before next benchmark...

>>> Running v0.18.4-beta (2/4)...
==========================================
Benchmarking LND v0.18.4-beta
Using temporary storage: /tmp/lnd-bench-v0.18.4-beta-2Z0bCV
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 852086)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 852086 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.18.4-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 363.35 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 454.38 MB | CPU: 240.00%
[Sample 2] Block: 197000 | synced: false | Memory: 118.01 MB | CPU: 183.00%
[Sample 3] Block: 275000 | synced: false | Memory: 132.96 MB | CPU: 158.00%
[Sample 4] Block: 351000 | synced: false | Memory: 150.52 MB | CPU: 153.00%
[Sample 5] Block: 432000 | synced: false | Memory: 163.67 MB | CPU: 158.00%
[Sample 6] Block: 508000 | synced: false | Memory: 178.56 MB | CPU: 164.00%
[Sample 7] Block: 583000 | synced: false | Memory: 190.03 MB | CPU: 156.00%
[Sample 8] Block: 635000 | synced: false | Memory: 209.26 MB | CPU: 148.00%
[Sample 9] Block: 705000 | synced: false | Memory: 226.96 MB | CPU: 151.00%
[Sample 10] Block: 791000 | synced: false | Memory: 248.11 MB | CPU: 161.00%
[Sample 11] Block: 816000 | synced: false | Memory: 255.58 MB | CPU: 132.00%
[Sample 12] Block: 918000 | synced: false | Memory: 251.60 MB | CPU: 132.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.18.4-beta
========================================
Execution Time: 70.358434022 seconds
Peak Memory Usage: 454.38 MB
Average Memory Usage: 234.59 MB
Average CPU Usage: 157.61%
Samples Collected: 14

Results saved to: benchmark_v0.18.4-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.4-beta-2Z0bCV

Waiting 10 seconds before next benchmark...

>>> Running v0.18.5-beta (3/4)...
==========================================
Benchmarking LND v0.18.5-beta
Using temporary storage: /tmp/lnd-bench-v0.18.5-beta-XHFt2i
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 852853)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 852853 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.18.5-beta --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 352.79 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 475.08 MB | CPU: 244.00%
[Sample 2] Block: 207000 | synced: false | Memory: 116.33 MB | CPU: 180.00%
[Sample 3] Block: 256000 | synced: false | Memory: 138.87 MB | CPU: 144.00%
[Sample 4] Block: 347000 | synced: false | Memory: 156.42 MB | CPU: 163.00%
[Sample 5] Block: 434000 | synced: false | Memory: 166.09 MB | CPU: 163.00%
[Sample 6] Block: 514000 | synced: false | Memory: 179.01 MB | CPU: 161.00%
[Sample 7] Block: 571000 | synced: false | Memory: 190.89 MB | CPU: 150.00%
[Sample 8] Block: 635000 | synced: false | Memory: 208.35 MB | CPU: 143.00%
[Sample 9] Block: 721000 | synced: false | Memory: 221.60 MB | CPU: 163.00%
[Sample 10] Block: 795000 | synced: false | Memory: 237.88 MB | CPU: 162.00%
[Sample 11] Block: 854000 | synced: false | Memory: 262.40 MB | CPU: 156.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.18.5-beta
========================================
Execution Time: 65.119188935 seconds
Peak Memory Usage: 475.08 MB
Average Memory Usage: 230.24 MB
Average CPU Usage: 162.00%
Samples Collected: 13

Results saved to: benchmark_v0.18.5-beta.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.18.5-beta-XHFt2i

Waiting 10 seconds before next benchmark...

>>> Running v0.20.0-beta.rc1-after (4/4)...
==========================================
Benchmarking LND v0.20.0-beta.rc1-after
Using temporary storage: /tmp/lnd-bench-v0.20.0-beta.rc1-after-LeHuC4
Resource Limits:
  - CPU: 4 cores
  - Memory: 8G
==========================================
Starting LND in isolated namespace with tmpfs...
Started LND (PID: 854003)
Monitoring sync status...

To manually check status, run:
  sudo nsenter --target 854003 --pid --mount --uts --ipc \
    /home/dev/lnd/lncli-debug-v0.20.0-beta.rc1-after --lnddir=/tmp/lnd_data_ns/lnd getinfo

[Sample 0] Block: unknown | synced: false | Memory: 356.35 MB | CPU: 0%
[Sample 1] Block: unknown | synced: false | Memory: 489.78 MB | CPU: 249.00%
[Sample 2] Block: 207000 | synced: false | Memory: 144.97 MB | CPU: 222.00%
[Sample 3] Block: 278000 | synced: false | Memory: 172.72 MB | CPU: 203.00%
[Sample 4] Block: 355000 | synced: false | Memory: 207.43 MB | CPU: 221.00%
[Sample 5] Block: 436000 | synced: false | Memory: 242.13 MB | CPU: 224.00%
[Sample 6] Block: 510000 | synced: false | Memory: 269.44 MB | CPU: 216.00%
[Sample 7] Block: 577000 | synced: false | Memory: 303.05 MB | CPU: 222.00%
[Sample 8] Block: 657000 | synced: false | Memory: 324.72 MB | CPU: 228.00%
[Sample 9] Block: 721000 | synced: false | Memory: 354.95 MB | CPU: 214.00%
[Sample 10] Block: 771000 | synced: false | Memory: 378.89 MB | CPU: 208.00%
[Sample 11] Block: 795000 | synced: false | Memory: 405.87 MB | CPU: 183.00%
[Sample 12] Block: 872000 | synced: false | Memory: 392.42 MB | CPU: 227.00%
synced_to_chain
✓ synced_to_chain: true (block height: 918649)
Stopping LND...

========================================
BENCHMARK RESULTS - v0.20.0-beta.rc1-after
========================================
Execution Time: 70.416534861 seconds
Peak Memory Usage: 489.78 MB
Average Memory Usage: 308.40 MB
Average CPU Usage: 208.15%
Samples Collected: 14

Results saved to: benchmark_v0.20.0-beta.rc1-after.json
Temporary directory will be cleaned up automatically
Cleaning up temporary directory: /tmp/lnd-bench-v0.20.0-beta.rc1-after-LeHuC4

==========================================
COMPARISON RESULTS
==========================================

Metric                    | v0.18.3-beta      | v0.18.4-beta      | v0.18.5-beta      | v0.20.0-beta.rc1-after
------------------------- | ----------------- | ----------------- | ----------------- | -----------------
Execution Time (s)        | 70.434696628      | 70.358434022      | 65.119188935      | 70.416534861
Peak Memory (MB)          | 437.65            | 454.38            | 475.08            | 489.78
Average Memory (MB)       | 231.76            | 234.59            | 230.24            | 308.4
Average CPU (%%)          | 157.76            | 157.61            | 162               | 208.15

Summary:
--------
✓ Fastest: v0.18.5-beta (65.119188935s)
✓ Lowest Peak Memory: v0.18.3-beta (437.65 MB)
✓ Lowest Average CPU: v0.18.4-beta (157.61%)

Detailed logs available in:
  lnd_v0.18.3-beta.log
  lnd_v0.18.4-beta.log
  lnd_v0.18.5-beta.log
  lnd_v0.20.0-beta.rc1-after.log
Conclusions

Execution time for v0.20.0-beta.rc1 with the changeset in this PR remains competitive at 70.42 seconds, matching the non-regression performance of v0.18.3-beta and v0.18.4-beta (70.43s and 70.36s respectively) while being only 8.1% slower than v0.18.5-beta (65.12s). Most importantly, this represents a dramatic recovery from the multi-day sync times that plagued v0.19.0-beta, v0.19.1-beta, v0.19.2-beta, v0.19.3-beta, and v0.20.0 current release candidates - what previously took days now completes in just over a minute, making LND v0.20.0 fully operational again.

Constraints

  • Each LND Benchmark runs its own isolated linux namespace with all same upper bound for cgroup resources for fair and representative results

Reproducibility

  • Choosing the target release branches to benchmark against, build LND in debug mode using make build, and then tweak compare_lnd_versions.sh script to match generated artifacts names. Then you would be able to run it and get those benchmarking results. Seeing the benchmarks in raw format section above for perhaps a more verbose helpful output

Resources

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 12, 2025

cc @hsjoberg, @shocknet-justin Pushed a new changeset and it will solve the wallet sync issues you experienced and this one as well #9081

@mohamedawnallah mohamedawnallah changed the title neutrinonotify+chainview: optimize neutrino rescan with wallet birthday rpcserver: avoid false negatives in dispatcher during the initial sync Oct 12, 2025
@shocknet-justin
Copy link

@mohamedawnallah I think this partially solved the issue, in that lnd reports syncd_to_chain=true after headers complete, which should unblock our deployments, but what I understand to be the rescan is still taking place after headers sync:

image

@mohamedawnallah
Copy link
Contributor Author

@mohamedawnallah I think this partially solved the issue, in that lnd reports syncd_to_chain=true after headers complete, which should unblock our deployments, but what I understand to be the rescan is still taking place after headers sync:

image

It took me two hours to identify the underlying root cause. The changeset I submitted was addressing the symptom rather than the core issue. I'll submit a new changeset tomorrow that fully resolves the root cause. Thanks for being responsive during the resolution process!

rpcserver.go Outdated
// Sync state transitions as dispatcher catches up:
// Gap >10 blocks: synced (ignore dispatcher lag during catchup)
// Gap 1-10 blocks: not synced (strict: must be at exact tip)
// Gap 0 blocks: synced (dispatcher caught up)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockbeatDispatcher is receiving blocks and dispatching them to blockbeat subscribers, I don't think it's safe to assume we've synced. It's likely that one of the subsystems is still busy processing the block, causing the blockbeat to stall. We should instead identify what's causing the stall fix properly fix it, other than giving user the false impression that lnd is synced.

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 13, 2025

Okay today's experiments haven't come to a meaningful conclusion fix. I am 100% confident that is the main changeset that introduced this regression is this one: 16a8b623b. I guess it is more of race condition/not up-to-date height value after the chain notifier moved from its place to there.

Building ANY LND before that commit will NOT TRIGGER any premature wallet rescanning.
Building ANY LND during/after that commit will TRIGGER the premature wallet rescanning.

Will try tomorrow with different experiments :)

@mohamedawnallah mohamedawnallah changed the title rpcserver: avoid false negatives in dispatcher during the initial sync rpcserver: resolve root cause of premature wallet rescanning Oct 14, 2025
@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 16e24f1 to a93c2d3 Compare October 15, 2025 20:26
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 15, 2025

Alrighty, I have submitted a new changeset and It will solve the premature rescanning issue. Something you will notice for now when trying this is that the LND RPC server will not start till the headers will be fully synced to chain that means issuing any RPC calls/commands against LND won't be responsive while headers syncing. That can be noticed on first run, on subsequent runs is negligble unless there is a big timing difference between those runs.

I will put later today a root cause analysis report that will help answering the following genuine questions:

  1. What triggers this premature rescanning?
  2. Why when I restart LND that premature rescanning not happening?
  3. Is really syncing headers before starting RPC server necessary? if not what needs to happen to allow this (if any input)? If it is indeed necessary how can the gap between neutrino p2p headers sync and starting LND RPC server be significantly reduced (on the first LND run)?
  4. How using other chain backends e.g btcd, bitcoind would affect the gap between initial sync and LND RPC server startup?
  5. What are the LND releases impacted by this?

For now you can experiment with this changeset if that solves your wallet sync issues

cc @shocknet-justin, @hsjoberg

@shocknet-justin
Copy link

Our workaround is already to no not create until its done, but that's not ideal because its slowing the on-boarding flow relative to prior versions, though its not the end of the world either since its usually only by a minute or two

The lack of rpc at all is a bigger problem since we have to parse logs to know how things are progressing, otherwise we're polling into the void... and both options can be flakey

An rpc for the state that is available immediately would solve the latter problem

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 15, 2025

I understand this is not the ideal solution. We are iterating on what the ideal solution could look like. Our process has been:

  1. First, we verified whether this is an existing issue for current users
  2. Then, we identified and analyzed whether this is a regression, and if so, what the root cause might be
  3. Now, we are evaluating intentional changes to resolve that root cause

Why does the current changeset move headers syncing before server start?

Let me explain by comparing the old behavior (non-regression) with the new behavior.

Old Behavior (Non-Regression)

We can use commit 1dfb5a0 as a representative example of the LND version where premature rescanning was not happening.

In that version, the sequence was:

  1. RPC server starts first
  2. Full syncing happens as part of the daemon server
  3. After headers are fully synced, the server calls Chain Notifier, which starts the Neutrino rescanning process
  4. Rescanning begins from the current height (which is confident to be recent since we just synced)

References:

  • lnd/lnd.go

    Lines 694 to 751 in 1dfb5a0

    type syncResult struct {
    synced bool
    bestBlockTime int64
    err error
    }
    var syncedResChan = make(chan syncResult, 1)
    for {
    // We check if the wallet is synced in a separate goroutine as
    // the call is blocking, and we want to be able to interrupt it
    // if the daemon is shutting down.
    go func() {
    synced, bestBlockTime, err := activeChainControl.Wallet.
    IsSynced()
    syncedResChan <- syncResult{synced, bestBlockTime, err}
    }()
    select {
    case <-interceptor.ShutdownChannel():
    return nil
    case res := <-syncedResChan:
    if res.err != nil {
    return mkErr("unable to determine if wallet "+
    "is synced", res.err)
    }
    ltndLog.DebugS(ctx, "Syncing to block chain",
    "best_block_time", time.Unix(res.bestBlockTime, 0),
    "is_synced", res.synced)
    if res.synced {
    break
    }
    // If we're not yet synced, we'll wait for a second
    // before checking again.
    select {
    case <-interceptor.ShutdownChannel():
    return nil
    case <-time.After(time.Second):
    continue
    }
    }
    break
    }
    _, bestHeight, err = activeChainControl.ChainIO.GetBestBlock()
    if err != nil {
    return mkErr("unable to determine chain tip", err)
    }
    ltndLog.InfoS(ctx, "Chain backend is fully synced!",
    "end_height", bestHeight)
  • lnd/server.go

    Lines 2129 to 2134 in 1dfb5a0

    cleanup = cleanup.add(s.cc.ChainNotifier.Stop)
    if err := s.cc.ChainNotifier.Start(); err != nil {
    startErr = err
    return
    }

New Behavior (Regression)

With commit 16a8b62, the order of operations is FLIPPED:

  1. Chain Notifier (which triggers Neutrino rescanning) is called first in newServer, before starting the RPC server
  2. Wallet syncing happens after the RPC server starts

This means the height that Chain Notifier accesses comes from BuildChainControl, NOT from the last synced height. That's why we're seeing rescanning start from ~100,000 headers.

References:

  • lnd/lnd.go

    Lines 496 to 502 in 6ade31d

    activeChainControl, cleanUp, err := implCfg.BuildChainControl(
    partialChainControl, walletConfig,
    )
    if err != nil {
    return mkErr("error loading chain control", err)
    }
  • lnd/server.go

    Lines 736 to 747 in 6ade31d

    // Start the low-level services once they are initialized.
    //
    // TODO(yy): break the server startup into four steps,
    // 1. init the low-level services.
    // 2. start the low-level services.
    // 3. init the high-level services.
    // 4. start the high-level services.
    if err := s.startLowLevelServices(); err != nil {
    return nil, err
    }
    currentHash, currentHeight, err := s.cc.ChainIO.GetBestBlock()

Why Not Return to the Old Order?

A natural question: Why not sync first, then call Chain Notifier rescanning in a separate goroutine so it has access to recent height (i.e., move it out of newServer like the earlier variant)? That way, premature rescanning would be avoided and the RPC server wouldn't be blocked.

That's a good question. I don't really know the answer, and I may need @yyforyongyu's input on this. If it turns out that s.cc.ChainNotifier.Start() must be called before the server starts (like seen down below as currently in the codebase), then perhaps there is no other way than syncing headers first (meaning the RPC server wouldn't start yet), then starting the server so Neutrino rescanning gets the recent synced height. Otherwise, this premature rescanning will continue to happen.

Reference:

  • lnd/server.go

    Lines 2128 to 2150 in 6ade31d

    // startLowLevelServices starts the low-level services of the server. These
    // services must be started successfully before running the main server. The
    // services are,
    // 1. the chain notifier.
    //
    // TODO(yy): identify and add more low-level services here.
    func (s *server) startLowLevelServices() error {
    var startErr error
    cleanup := cleaner{}
    cleanup = cleanup.add(s.cc.ChainNotifier.Stop)
    if err := s.cc.ChainNotifier.Start(); err != nil {
    startErr = err
    }
    if startErr != nil {
    cleanup.run()
    }
    return startErr
    }

@yyforyongyu
Copy link
Member

Wallet syncing happens after the RPC server starts

The wallet syncing is started before the RPC server starts. By searching SynchronizeRPC and tracing its callsite, we can see that the wallet is started inside BuildChainControl before newServer,

lnd/lnd.go

Line 628 in 6ade31d

server, err := newServer(

and here we just check whether the wallet is sycned or not,

lnd/lnd.go

Lines 706 to 713 in 6ade31d

// We check if the wallet is synced in a separate goroutine as
// the call is blocking, and we want to be able to interrupt it
// if the daemon is shutting down.
go func() {
synced, bestBlockTime, err := activeChainControl.Wallet.
IsSynced()
syncedResChan <- syncResult{synced, bestBlockTime, err}
}()

When the node starts, it will first get the current block height, and pass it to other subsystems,

lnd/server.go

Line 747 in 6ade31d

currentHash, currentHeight, err := s.cc.ChainIO.GetBestBlock()

Then other subsystems can get notified about spending of txns.

There is a big improvement we can make, which is layed out here. The idea is that, the two rescan processes - one from the wallet and the other from the notifier can be combined into one, such that we only need to sync blocks once. I think once that's done the startup will be way faster.

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 16, 2025

Wallet syncing happens after the RPC server starts

What I meant here by syncing here is full syncing. If running LND in debug mode, someone will see that actually when we reach here we are at ~100,000 headers. The following are the code paths where actually full syncing happens:

lnd/lnd.go

Lines 694 to 751 in 1dfb5a0

type syncResult struct {
synced bool
bestBlockTime int64
err error
}
var syncedResChan = make(chan syncResult, 1)
for {
// We check if the wallet is synced in a separate goroutine as
// the call is blocking, and we want to be able to interrupt it
// if the daemon is shutting down.
go func() {
synced, bestBlockTime, err := activeChainControl.Wallet.
IsSynced()
syncedResChan <- syncResult{synced, bestBlockTime, err}
}()
select {
case <-interceptor.ShutdownChannel():
return nil
case res := <-syncedResChan:
if res.err != nil {
return mkErr("unable to determine if wallet "+
"is synced", res.err)
}
ltndLog.DebugS(ctx, "Syncing to block chain",
"best_block_time", time.Unix(res.bestBlockTime, 0),
"is_synced", res.synced)
if res.synced {
break
}
// If we're not yet synced, we'll wait for a second
// before checking again.
select {
case <-interceptor.ShutdownChannel():
return nil
case <-time.After(time.Second):
continue
}
}
break
}
_, bestHeight, err = activeChainControl.ChainIO.GetBestBlock()
if err != nil {
return mkErr("unable to determine chain tip", err)
}
ltndLog.InfoS(ctx, "Chain backend is fully synced!",
"end_height", bestHeight)

When the node starts, it will first get the current block height, and pass it to other subsystems
currentHash, currentHeight, err := s.cc.ChainIO.GetBestBlock()

I am sure and confident as currently in the codebase this currentHeight after startLowLevelServices in newServer is not the last synced one. It was far behind around ~100,000 ish header the last one we got from BuildChainControl invocation.

The idea is that, the two rescan processes - one from the wallet and the other from the notifier can be combined into one, such that we only need to sync blocks once.

It looks like the rescan process holding a state and running them in parallel then converging them into one may not be safe and can potentially introduce a race condition

@yyforyongyu – I would love to hear you thoughts what is the best take this PR can safely take given my responses above and time constraint for LND release?

@hsjoberg
Copy link
Contributor

hsjoberg commented Oct 17, 2025

Thank you for the analysis of the root cause behind this issue, @mohamedawnallah.

Regarding starting the main RPC server after syncing is done (a93c2d3); I just tested this with Blixt Wallet and (as expected) the RPC is not reachable before the chain is synced.

This kinda breaks Blixt as we let the user play around with the app while it is syncing -- for example the user can send onchain so that they can be onboarded quicker. We also use GetInfo to display the syncing progress in percentage.
Basically the whole app expects the RPC server to be available as we let the user in immediately after creating/unlocking a wallet.

We would really like to be able to use the RPC server before the chain has been synced. The current fix can't really work for us.
I also tried the previous patch rpcserver: avoid false negatives in dispatcher during the initial sync, and this works for us as synced_to_chain goes to true eventually, albeit not ideal as it still spam log entries (also in general sounds like a worse fix if I understand the root cause of the issue correctly).

Is there any other way we can solve this one?

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from a93c2d3 to 2fc8729 Compare October 19, 2025 20:15
Comment on lines -738 to -742
// TODO(yy): break the server startup into four steps,
// 1. init the low-level services.
// 2. start the low-level services.
// 3. init the high-level services.
// 4. start the high-level services.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a planned future improvement. If these code changes are approved and merged as-is, it would be beneficial creating a tracking issue for this TODO(s) probably by the assignee to ensure we don't lose track of it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why did we ever start the chain watcher in the newServer function, which should just do init.

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 19, 2025

I've heard all of your requests @hsjoberg, @shocknet-justin, @yyforyongyu regarding the importance of not blocking the LND RPC server while fully syncing headers. Could you please help evaluate the new updated changeset to see if it's working as expected? I've tested it and it seems to be working, but I may have done that evaluation in a tight loop

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 2fc8729 to 491c3d2 Compare October 19, 2025 20:29
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 19, 2025

The last force push after inviting for changes evaluation is just related to git commit msg title

@saubyk saubyk requested review from Roasbeef and ziggie1984 and removed request for yyforyongyu October 21, 2025 17:11
@hsjoberg
Copy link
Contributor

Could you please help evaluate the new updated changeset to see if it's working as expected? I've tested it and it seems to be working, but I may have done that evaluation in a tight loop

Hi @mohamedawnallah. I tried the new solution and it's working well for me inside Blixt. Everything is working as expected!

@Roasbeef
Copy link
Member

To add a bit more context, the neutrino backend for the ChainNotifier is a bit different than the others.

When started, it'll create a new rescan using the backend:

// First, we'll obtain the latest block height of the p2p node. We'll
// start the auto-rescan from this point. Once a caller actually wishes
// to register a chain view, the rescan state will be rewound
// accordingly.
startingPoint, err := n.p2pNode.BestBlock()
if err != nil {
n.txUpdates.Stop()
return err
}
startingHeader, err := n.p2pNode.GetBlockHeader(
&startingPoint.Hash,
)
if err != nil {
n.txUpdates.Stop()
return err
}
n.bestBlock.Hash = &startingPoint.Hash
n.bestBlock.Height = startingPoint.Height
n.bestBlock.BlockHeader = startingHeader
n.txNotifier = chainntnfs.NewTxNotifier(
uint32(n.bestBlock.Height), chainntnfs.ReorgSafetyLimit,
n.confirmHintCache, n.spendHintCache,
)
// Next, we'll create our set of rescan options. Currently it's
// required that a user MUST set an addr/outpoint/txid when creating a
// rescan. To get around this, we'll add a "zero" outpoint, that won't
// actually be matched.
var zeroInput neutrino.InputWithScript
rescanOptions := []neutrino.RescanOption{
neutrino.StartBlock(startingPoint),
neutrino.QuitChan(n.quit),
neutrino.NotificationHandlers(
rpcclient.NotificationHandlers{
OnFilteredBlockConnected: n.onFilteredBlockConnected,
OnFilteredBlockDisconnected: n.onFilteredBlockDisconnected,
OnRedeemingTx: n.onRelevantTx,
},
),
neutrino.WatchInputs(zeroInput),
}
// Finally, we'll create our rescan struct, start it, and launch all
// the goroutines we need to operate this ChainNotifier instance.
n.chainView = neutrino.NewRescan(
&neutrino.RescanChainSource{
ChainService: n.p2pNode,
},
rescanOptions...,
)
n.rescanErr = n.chainView.Start()
n.wg.Add(1)
go n.notificationDispatcher()
// Set the active flag now that we've completed the full
// startup.
atomic.StoreInt32(&n.active, 1)
chainntnfs.Log.Debugf("neutrino notifier started")

The start height of this rescan is based on the current best height of the p2p node:

// First, we'll obtain the latest block height of the p2p node. We'll
// start the auto-rescan from this point. Once a caller actually wishes
// to register a chain view, the rescan state will be rewound
// accordingly.
startingPoint, err := n.p2pNode.BestBlock()
if err != nil {
n.txUpdates.Stop()
return err
}
.

If we start this right in newServer, before any syncing has happened, then it appears that it'll start from the genesis height, instead of the current point of syncing.

The correct ordering is that we only call into server.Start() (and therefore chainNotifier.Start()) after the chain backend has fully synced: https://github.com/mohamedawnallah/lnd/blob/491c3d24f4559b922c878bdee592403ae2a8849d/lnd.go#L705-L762

@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 491c3d2 to 41ba412 Compare October 22, 2025 17:57
…re rescan

This commit addresses a regression where Neutrino rescanning starts
from an outdated height (~100k blocks behind) instead of using the
current synced height.

Root Cause:
In commit 16a8b62, the initialization order was changed so that
Chain Notifier starts before wallet syncing completes. This means
the rescan begins using the stale height from BuildChainControl
rather than the fully synced height.

Old behavior (commit 1dfb5a0):
1. RPC server starts
2. Headers sync as part of daemon server
3. Chain Notifier starts after sync completes
4. Rescan begins from current (synced) height

Current behavior (regression):
1. Chain Notifier starts in newServer (before RPC)
2. Wallet sync happens after RPC server starts
3. Rescan uses outdated height from BuildChainControl

Solution:
- Ensure headers are fully synced before starting the chain notifier,
and after starting the RPC server.
- Move chain notifier startup to its correct location after headers are
fully synced.
- Make sure the starting beat is lazily called after chain notifier
started and before that starting beat result is used.
@mohamedawnallah mohamedawnallah force-pushed the optimize-neutrino-rescan branch from 41ba412 to f761dc4 Compare October 22, 2025 18:01
@saubyk saubyk moved this from In progress to In review in lnd v0.20 Oct 22, 2025
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 22, 2025

This test failure is unrelated to the changeset this PR introduces. It is unconcerning as well; It seems a transient error:

--- FAIL: TestLightningNetworkDaemon (308.43s)
    harness_setup.go:25: Setting up HarnessTest...
    harness_setup.go:38: Prepare the miner and mine blocks to activate segwit...
    harness_setup.go:46: Connecting the miner at 127.0.0.1:10023 with the chain backend...
    --- FAIL: TestLightningNetworkDaemon/tranche07/281-of-314/bitcoind/multihop-local_claim_incoming_htlc_anchor_zero_conf (145.61s)
        harness_node.go:395: Starting node (name=Alice) with PID=20781
        harness_node.go:395: Starting node (name=Bob) with PID=20845
        harness_node.go:395: Starting node (name=Carol) with PID=20891
        harness_node.go:395: Starting node (name=Bob) with PID=21130
        harness.go:561: 
            	Error Trace:	/home/runner/work/lnd/lnd/lntest/harness.go:561
            	            				/home/runner/work/lnd/lnd/itest/lnd_multi-hop_force_close_test.go:1778
            	            				/home/runner/work/lnd/lnd/itest/lnd_multi-hop_force_close_test.go:1547
            	            				/home/runner/work/lnd/lnd/lntest/harness.go:315
            	            				/home/runner/work/lnd/lnd/itest/lnd_test.go:130
            	Error:      	Received unexpected error:
            	            	[Alice]: assert shutdown failed in log[.logs-tranche7/13-multihop-local_claim_incoming_htlc_anchor_zero_conf-Alice-023d5a05.log]: node did not shut down properly: found log lines: werRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping RouterRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping DevRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping PeersRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping SignRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping VersionRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping InvoicesRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping WatchtowerClientRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping AutopilotRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping ChainRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] RPCS rpcserver.go:1034: Stopping NeutrinoKitRPC Sub-RPC Server
            	            	2025-10-22 18:10:55.122 [INF] LNWL estimator.go:877: Stopping web API fee estimator
            	Test:       	TestLightningNetworkDaemon/tranche07/281-of-314/bitcoind/multihop-local_claim_incoming_htlc_anchor_zero_conf
            	Messages:   	failed to stop Alice
        harness.go:393: finished test: multihop-local_claim_incoming_htlc_anchor_zero_conf, start height=533, end height=553, mined blocks=20
        harness.go:352: test failed, skipped cleanup
    lnd_test.go:138: Failure time: 2025-10-22 18:12:25.706

@shocknet-justin
Copy link

shocknet-justin commented Oct 22, 2025

Gave this a try to and it seems to work, thank you.

@mohamedawnallah
Copy link
Contributor Author

Gave this a try to and it seems to work, thank you.

Thanks @shocknet-justin for confirming the solution is working! Your responsive feedback, along with input from @hsjoberg, was important in resolving this long-standing issue 🙇‍♂️

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤴🏾

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM walked through the problem with an LLM to understand it better:

⏺ Summary: Neutrino Rescan Optimization Fix

  The Real Problem

  When LND starts with Neutrino backend, there are two independent rescanning processes that serve different purposes:

  1. BTCWallet Rescan (Wallet Transaction History)

  - Purpose: Rebuild wallet's transaction history and UTXO set
  - Scans from: Wallet birthday → current chain tip
  - When: During BuildChainControl → Wallet.Startup() → SynchronizeRPC()
  - Status: Runs in background during the IsSynced() wait loop

  2. ChainNotifier Rescan (Lightning Event Monitoring)

  - Purpose: Provide real-time notifications for Lightning operations (channel confirmations, HTLC tracking, spend notifications)
  - Scans from: Starting height → forward
  - When: At ChainNotifier.Start()
  - Status: Processes every block it encounters

  The Regression

  In commit 16a8b623b, the ChainNotifier startup was moved to happen inside newServer(), which occurs before the header sync completes.

  Broken Flow (OLD CODE):

  1. BuildChainControl
     └─ Neutrino starts, DB at height: 750,000
     └─ BTCWallet begins background rescan from birthday

  2. newServer()
     └─ ChainNotifier.Start() ← TOO EARLY!
        └─ Reads BestBlock() → 750,500 (mid-sync!)
        └─ Begins processing blocks from 750,500

  3. RPC Server starts

  4. IsSynced() wait loop (750,500 → 850,000)
     ├─ Neutrino downloads headers & filters
     ├─ BTCWallet rescans wallet history
     └─ ChainNotifier processes EVERY intermediate block:
        750,501... 750,600... 751,000... 800,000... 849,999... 850,000
        (100,000 blocks processed unnecessarily!)

  5. "Chain backend is fully synced!" at 850,000

  The Waste:

  For every intermediate block during sync (750,500 → 850,000):
  - ✅ Compact filter downloaded by Neutrino (~25 KB)
  - ✅ ChainNotifier receives OnFilteredBlockConnected notification
  - ✅ ChainNotifier processes the notification:
    - Function call overhead (onFilteredBlockConnected() → connectFilteredBlock())
    - Updates internal state (bestBlock height/hash tracking)
    - Checks watch registry for pending notifications
    - Dispatches to block epoch subscribers
    - Updates transaction notifier state
  - ❌ Nothing relevant found (Lightning channels operate at current tip!)
  - ❌ CPU cycles wasted processing 100,000+ irrelevant block notifications
  - ❌ Potential false positives in compact filters could trigger unnecessary full block downloads

  Impact:
  - Wasted processing: ~100,000 block notifications (function calls + state updates)
  - Bandwidth overhead: ~2.5 GB of compact filters downloaded by Neutrino (unavoidable)
  - Extra bandwidth risk: Full blocks downloaded only if filters match watched items (rare but possible)
  - Delayed startup: 10-60 minutes of unnecessary notification processing
  - Poor UX: Wallet appears "stuck" syncing

  Important Note on Bandwidth:

  ChainNotifier does NOT download all full blocks! The overhead is primarily CPU-based:

  - Neutrino downloads headers (8 MB) + compact filters (~2.5 GB) for all blocks regardless
  - ChainNotifier processes the notifications for each block (lightweight metadata)
  - Full blocks (1-4 MB each) are only downloaded when compact filters match watched addresses/UTXOs
  - For a new wallet or node with no active channels, filter matches are rare/non-existent
  - The real waste is 100,000+ function calls and state updates, not bandwidth

  The Fix

  Move ChainNotifier.Start() to occur after the IsSynced() wait completes, ensuring headers are fully synced before ChainNotifier begins monitoring.

  Fixed Flow (NEW CODE):

  1. BuildChainControl
     └─ Neutrino starts, DB at height: 750,000
     └─ BTCWallet begins background rescan from birthday

  2. newServer()
     └─ ChainNotifier NOT started yet ✅

  3. RPC Server starts

  4. IsSynced() wait loop (750,000 → 850,000)
     ├─ Neutrino downloads headers & filters
     ├─ BTCWallet rescans wallet history
     └─ ChainNotifier NOT listening yet ✅
        (No intermediate blocks processed!)

  5. "Chain backend is fully synced!" at 850,000

  6. server.Start()
     └─ ChainNotifier.Start() ← CORRECT TIMING! ✅
        └─ Reads BestBlock() → 850,000 (fully synced!)
        └─ Monitors only NEW blocks from 850,000 onwards

  Benefits:

  - ✅ Zero unnecessary block processing during sync
  - ✅ Faster startup: Avoids 10-60 minutes of wasted notification processing per 100k blocks
  - ✅ Reduced CPU usage: No function calls or state updates for intermediate blocks
  - ✅ Same bandwidth: Neutrino still downloads filters (required), but ChainNotifier doesn't process them
  - ✅ Better UX: Wallet reaches "ready" state faster
  - ✅ Wallet history still complete: BTCWallet rescan happens independently

  Key Insight

  The fix recognizes that:
  - BTCWallet needs historical data (wallet transactions from birthday)
  - ChainNotifier only needs current data (Lightning operations at chain tip)

  By delaying ChainNotifier startup until after sync completes, we avoid processing 100,000+ irrelevant block notifications while still maintaining complete wallet transaction history
   through BTCWallet's independent rescan process.

  The optimization is primarily CPU-focused, not bandwidth-focused. Neutrino downloads headers and compact filters regardless of when ChainNotifier starts. The savings come from
  eliminating 100,000+ unnecessary function calls, state updates, and notification dispatches that ChainNotifier would perform for blocks that are irrelevant to Lightning operations.

  Code Changes

  Removed: startLowLevelServices() call in newServer() that started ChainNotifier early

  Added: ChainNotifier.Start() moved to server.Start() after the sync wait completes

  Result: ChainNotifier only processes blocks from the fully-synced chain tip onwards, eliminating unnecessary notification processing overhead during the initial header sync phase.

@Roasbeef Roasbeef merged commit 6a82c61 into lightningnetwork:master Oct 23, 2025
35 of 39 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in lnd v0.20 Oct 23, 2025
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Oct 23, 2025

Just a disclaimer the LLM reasoning interpretation is not correct.

  ...
  Broken Flow (OLD CODE):

  1. BuildChainControl
     └─ Neutrino starts, DB at height: 750,000
     └─ BTCWallet begins background rescan from birthday

  2. newServer()
     └─ ChainNotifier.Start() ← TOO EARLY!
        └─ Reads BestBlock() → 750,500 (mid-sync!)
        └─ Begins processing blocks from 750,500

  3. RPC Server starts

  4. IsSynced() wait loop (750,500 → 850,000)
     ├─ Neutrino downloads headers & filters
     ├─ BTCWallet rescans wallet history
     └─ ChainNotifier processes EVERY intermediate block:
        750,501... 750,600... 751,000... 800,000... 849,999... 850,000
        (100,000 blocks processed unnecessarily!)

  5. "Chain backend is fully synced!" at 850,000
  ....
  Impact:
  - Wasted processing: ~100,000 block notifications (function calls + state updates)
  - Bandwidth overhead: ~2.5 GB of compact filters downloaded by Neutrino (unavoidable)
  - Delayed startup: 10-60 minutes of unnecessary notification processing
 ...

For a correct reasoning interpretation can be found here #10280 (comment) and #10280 (comment)

PS:
Navigating some snippets of LLM responses:

  1. BuildChainControl
    └─ Neutrino starts, DB at height: 750,000
    └─ BTCWallet begins background rescan from birthday

When we reach at BuildChainControl we are at 100,000ish header not 750,000. If the latter the wallet rescanning issue wouldn't be that much noticeable

  • Wasted processing: ~100,000 block notifications (function calls + state updates)
  • The real waste is 100,000+ function calls and state updates, not bandwidth
    The savings come from eliminating 100,000+ unnecessary function calls

It is ~900,000 wasted block processing not ~100,000

Faster startup: Avoids 10-60 minutes of wasted notification processing per 100k blocks

It is days not minutes under ideal conditions

@ziggie1984
Copy link
Collaborator

yeah I was more giving an example, if a node runner updated to the buggy software and had his neutrino node already header-synced to block 750k, but when starting up a new node obviously we totally ignore the wallet birthday and would sync from the beginning. Thank you for pointing that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

7 participants